| nettime's_dusty_archivist on Mon, 20 Mar 2000 07:32:44 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| <nettime> The Breaking of Cyber Patrol® 4 [part 2 of 2] |
[orig from <http://hem.passagen.se/eddy1/reveng/cp4/cp4break.html>]
[part 2 of 2]
If we were to go another step back we would get a record like this:
0x4348, 0x0000000, 0x00030103
This clashes with the structure as we know it, and so we assume that
there are only three records, the data before them having some other
structure. Looking, again backwards, we notice that the word following
the first table entry is 0x0003, which could mean that it's a count of
the number of tables, right? By checking against another file with the
same structure, the hotlist.not, we could see that this assumption was
correct.
The little bit left of the header is not as important as locating the
table entries and their count, but it seems like the 0x2A at offset
0x02 is the header size, assuming the header starts at 0x02 and the
two bytes in front of it being not related to it. The "CH" seems to be
a marker, the hotlist.not contains "HH" instead. Without more files to
compare to, or time-consuming debugging of the executable, the few
bytes left unaccounted for will remain a "mystery".
We learned several important things from the newsgroups list. First,
Microsystems likes putting length bytes on things. Second, the
blocking mask 0x000E (corresponding to "Partial Nudity", "Full
Nudity", and "Sexual Acts / Text") is the most popular one. It appears
that that's the generic "porn" label which they slap on everything
that looks like it might be porn, whether it technically applies or
not. Both these facts were useful in attacking the other two tables in
cyber.not.
The first table mentioned in the header is the biggest one. At over
half a megabyte, it makes up most of the bulk of the cyber.not file.
As our previous measurements indicated, this table includes a lot of
repeats at a distance of six or seven bytes. Character frequency
counts revealed that the top three characters in table 1 are:
1. 0x00 (106280 times)
2. 0x0E (65483 times)
3. 0x07 (25212 times)
We know that they like using blocking mask 0x000E, and the bytes
making up that number are the top two most frequent bytes in the
table. We know they like length bytes, we know there's some kind of
structure in here with a size of seven bytes, and 0x07 is the third
most frequent byte value. This looks promising. Let's look at a hex
dump. This dump was generated with the Linux od -Ax -txC command;
offsets are from the start of table 1 as specified in the cyber.not
header.
000000 53 44 0a 00 03 c7 00 00 07 0e 00 99 37 55 67 00
000010 0a 0a 0a 0a 0e 0c 0b 67 73 76 00 00 07 0e 00 51
000020 b1 f1 6d 00 0c 0a 79 c8 0e 00 0c 0a 9e 09 00 00
000030 0b 01 00 89 84 e0 4e 55 9e 53 d8 00 0c 0a bd 05
000040 00 00 07 0e 00 71 aa 8a 2a 00 0c 0b b8 18 00 00
000050 0b 08 00 ea 1e da d8 d4 fc d4 20 00 0c 0b b8 1a
000060 00 00 07 00 04 e0 3d c1 be 07 08 00 7b 75 fd b7
000070 07 00 04 87 0b 1e ef 00 0c 0b b8 1f 0e 00 0c 0b
000080 b8 2b 08 00 0c 0b b8 2c 0e 00 0c 0b b8 36 08 00
000090 0c 0d 78 02 00 00 07 0e 00 13 53 03 e2 00 0c 0d
0000a0 79 06 00 04 0c 0d ab 97 00 00 07 06 00 31 75 fc
0000b0 80 00 0c 0d 13 5a 0e 00 0c 0e c7 33 0e 00 0c 0e
0000c0 c8 02 00 00 07 0e 00 22 39 82 eb 00 0c 0e e1 0d
0000d0 00 00 07 01 00 0d b0 59 21 00 0c 0e e8 32 00 00
0000e0 07 20 00 7c d3 df f8 00 0c 0f 87 cd 00 00 07 0e
0000f0 00 88 35 ae 33 00 0c 0f c1 72 0e 00 0c 10 a0 d8
This may appear quite formidable to someone unaccustomed to reading
hex dumps, but careful examination reveals some interesting things.
First of all, the sequence "0e 00" occurs quite frequently. It's
reasonable to suppose that that might be the blocking mask for a page
or site. Another common one is "07 0e 00". When that one occurs, there
are often four more bytes and then those three again. These patterns
are easier to see when one examines more of the dump than the short
sample here.
It's reasonable to guess that the 07 is a length byte, just like in
the newsgroup list. But that doesn't explain why we get so many
repeats at distance six. The byte value 0x06 is only the 39th most
common value in table 1, even though there are far more repeats at
distance six than seven. So not everything can be tagged with a length
byte, or there's something else we don't understand.
Further skimming of the hex dump revealed inspirational passages like
this one:
037b50 5c b7 08 6f 00 cf cc ae 13 0e 00 cf cc ae c8 0e
037b60 00 cf cc ae c9 0e 00 cf cc ae ca 0e 00 cf cc ae
037b70 cc 0e 00 cf cc ae cd 0e 00 cf cc ae d0 0e 00 cf
037b80 cc ae 15 0e 00 cf cc ae d8 0e 00 cf cc ae 16 0e
037b90 00 cf cc ae 18 0e 00 cf cc ae 1b 0e 00 cf cc ae
037ba0 1d 0e 00 cf cc ae 1e 0e 00 cf cc ae 1f 0e 00 cf
037bb0 cc ae 21 1e 00 cf cc ae 23 0e 00 cf cc ae 24 0e
037bc0 00 cf cc ae 27 0e 00 cf cc ae 28 0e 00 cf cc ae
037bd0 30 0e 00 cf cc 13 ea 0e 00 cf cc c4 c4 0e 00 cf
037be0 cc d0 a0 0e 00 cf cc d0 f8 0e 00 cf cc d2 64 1f
037bf0 00 cf cc d2 a0 00 00 07 0e 00 e5 b0 e3 10 00 cf
037c00 cc d2 18 0e 00 cf cc d2 19 0e 00 cf cc d2 1e 0e
The pattern may be clearer if we look at the bytes six at a time:
037b54 00 cf cc ae 13 0e
037b5a 00 cf cc ae c8 0e
037b60 00 cf cc ae c9 0e
037b66 00 cf cc ae ca 0e
037b6c 00 cf cc ae cc 0e
037b72 00 cf cc ae cd 0e
037b78 00 cf cc ae d0 0e
037b7e 00 cf cc ae 15 0e
037b84 00 cf cc ae d8 0e
037b8a 00 cf cc ae 16 0e
037b90 00 cf cc ae 18 0e
037b96 00 cf cc ae 1b 0e
037b9c 00 cf cc ae 1d 0e
037ba2 00 cf cc ae 1e 0e
037ba8 00 cf cc ae 1f 0e
037bae 00 cf cc ae 21 1e
037bb4 00 cf cc ae 23 0e
037bba 00 cf cc ae 24 0e
037bc0 00 cf cc ae 27 0e
037bc6 00 cf cc ae 28 0e
037bcc 00 cf cc ae 30 0e
Here we've obviously got our generic porn mask of 0x000E, alternating
with four unknown bytes, the last of which often seems to be
incrementing - but not always. Scanning across the table, we saw that
when this kind of six-byte structure occurred, the four mystery bytes
seemed to more or less increment smoothly from the start of the table
to the end. But it was always the last byte that incremented first,
and then the second-to-last, and so on. In other words, the field is
being stored in "big endian" byte order, the exact opposite of the
"little endian" byte order conventional on PCs. Why would a PC
software package bother doing something in big endian when it's
running on a CPU designed for little endian?
At this point we had to depend on intuition. There is one thing that's
32 bits long and big endian everywhere, even on a PC: that is an IP
address. Some computers like big endian and some like little endian,
but it is standard for all Internet protocols to use big endian
regardless of what kind of system they're running on - so that they'll
all be able to talk to each other. An added bit of evidence is that
the actual values of this four-byte field seem to be distributed the
way one would expect IP addresses to be distributed. Lots of them
start with bytes like 0xCF, which puts them right in the popular part
of the Class C IP address space. So, let's write the decimal
equivalents of the supposed IP addresses next to the hex dump:
037b54 00 cf cc ae 13 0e 207.204.174.19
037b5a 00 cf cc ae c8 0e 207.204.174.200
037b60 00 cf cc ae c9 0e 207.204.174.201
037b66 00 cf cc ae ca 0e 207.204.174.202
037b6c 00 cf cc ae cc 0e 207.204.174.204
037b72 00 cf cc ae cd 0e 207.204.174.205
037b78 00 cf cc ae d0 0e 207.204.174.208
037b7e 00 cf cc ae 15 0e 207.204.174.21
037b84 00 cf cc ae d8 0e 207.204.174.216
037b8a 00 cf cc ae 16 0e 207.204.174.22
037b90 00 cf cc ae 18 0e 207.204.174.24
037b96 00 cf cc ae 1b 0e 207.204.174.27
037b9c 00 cf cc ae 1d 0e 207.204.174.29
037ba2 00 cf cc ae 1e 0e 207.204.174.30
037ba8 00 cf cc ae 1f 0e 207.204.174.31
037bae 00 cf cc ae 21 1e 207.204.174.33
037bb4 00 cf cc ae 23 0e 207.204.174.35
037bba 00 cf cc ae 24 0e 207.204.174.36
037bc0 00 cf cc ae 27 0e 207.204.174.39
037bc6 00 cf cc ae 28 0e 207.204.174.40
037bcc 00 cf cc ae 30 0e 207.204.174.42
Notice that these are not in numerical order; 216 is not normally
considered to come between 21 and 22. However, considered as decimal
representations, these addresses are in strict alphabetical order.
This list is the kind of thing you might get if you took a text list
of URLs and passed it through a sort utility designed for text. A
little examination reveals that these six-byte structures in table 1
are strictly in this "text IP" order across the entire table. As a
final confirmation that these numbers are intended to represent IP
addresses, just point a Web browser to a few. Almost all are porn
sites.
At this point we had figured out that there were a lot of blocking
masks interspersed with IP addresses in the table, and also a lot of
seven-byte structures starting with a length byte and a blocking mask.
But the remaining four bytes of those seven-byte structures were
apparently not sorted, nor IP addresses, and there were still some
bytes that didn't fit into either kind of structure. So we wrote a
Perl program to dump out the known structures and label the unknown
parts.
The next step was simply to stare at the output and look for patterns.
We saw that the six-byte and seven-byte records often occurred in
blocks of lots of the same kind all together. The unknown part often
seemed to consist of the byte 0x0B followed by a blocking mask and
eight bytes of garbage. We guessed that that might be a third record
type, so we added it to the dumper program, and noticed that the
remaining unknown sequences often seemed to consist of 0x0F, a
blocking mask, and then twelve bytes of garbage. From this we inferred
a general pattern: a length byte (always 3 plus a multiple of 4), a
blocking mask, and then some amount of garbage, always a multiple of
four bytes.
Between this and the six-byte IP/mask pattern, almost all the contents
of table 1 fit some kind of structure. But there were still a bunch of
zero bytes hanging around. A reasonable guess was that these signalled
some kind of "end of structure" condition. It only took a little more
intuition to realise that of the "length byte" records and the "IP
address" records, one logically went inside the other. Unfortunately,
we guessed that the "IP address" records went inside the "length byte"
records, and that confused us for quite a while. Here's part of the
output from our dumping program at this stage:
07 0E 00 0F 25 6B BF
07 0E 00 C8 87 B1 C1 (0501)(0800)(0800)(0000)
0B 02 00 B9 53 9A 71 6A BE 88 54
0B 00 08 B9 53 9A 71 3D 5F E2 F4
0B 00 08 B9 53 9A 71 38 16 1A 41
0B 08 00 B9 53 9A 71 07 B3 CA 02 (000E)(0000)
07 08 00 2F 31 2A 45 (000E)(000E)(0000)
07 0E 00 37 71 0F 71 (000E)(000E)(0008)(0000)
0B 01 00 88 B4 92 0E A6 53 2E 7F (000E)(0000)
07 98 04 08 B0 DD FB
07 08 00 0F E8 F5 82 (0000)
07 09 00 4F DE 86 ED (0000)
07 0E 00 79 1F 36 41
07 0E 00 63 C8 51 C4 (0000)
07 02 00 0A E2 34 93 (000E)(0000)
07 08 00 31 2D E5 BA (000A)(000E)(0800)(0020)(0000)
In this dump, the four-digit numbers in parentheses are abbreviations
for "IP address" records, showing only the blocking mask part. We had
already figured out, although it's a break with the tradition set
elsewhere in the file, that in the six-byte IP address records, the
blocking mask comes at the end instead of the start. Not shown in this
dump is the enormous variability in the number of IP addresses
apparently associated to each "length byte record"; some had dozens,
many had none at all.
Also, although it looks okay in this fragment, there's a critical
problem of how to recognize which records are which. The dumping
program would guess what looked like a plausible IP address, but it
sometimes guessed wrong and produced junk until it happened to
randomly re-synchronize. It appeared that IP records with a blocking
mask of 0x0000 helped signal "OK, length byte records coming now", and
a length byte of 0x00 (not shown here) signalled the start of a list
of IP address records, but these things raised problems because it
appeared that in a list of IP addresses, there would always be one
more address than there were blocking masks. Where would the blocking
mask for the last IP address come from?
Late one night, under the influence of a couple bowls of MSG-saturated
Korean instant noodles ("kimchee" flavour), we realised what we should
have seen all along. The "IP address" records are actually the major
records, and the other records go inside them, as children of a parent
IP address. This makes more logical sense, given the purpose of the
file; the package blocks either an entire IP address, or one or more
subsections of an IP address. Then the rest of the structure fell out
easily.
The basic record contains an IP address and a blocking mask. If the
blocking mask is nonzero, it applies to that entire IP address. If the
blocking mask is zero, then there are a number of subrecords, each
consisting of a length byte, a blocking mask, and one or more
four-byte unknown fields. A length byte of 0x00 terminates the list of
subrecords and signals a new IP address.
Now, what about those subrecords? Well, they obviously represent some
kind of subdivision of an IP address - like, for instance, a directory
full of Web pages. Here's an entry from table 1, decoded by a more
sophisticated Perl program that also incorporated reverse lookups of
the IP addresses:
207.34.139.253 (pii300.bc1.com):
000E D2A152F4 23AC865E
0002 D2A152F4 9ECA24AB
000E D2A152F4 4337DDA1
001E D2A152F4 F1909EA3
000E D2A152F4 8532C8E2
This particular entry stood out partly because bc1.com is an ISP local
to one of us. We have friends with pages on that system (although not,
as far as we could tell, at the particular URLs blocked by Cyber
Patrol). It also stood out because all the subrecords start with the
same four-byte sequence. That's a pattern that appears in lots of
other entries, too; there will often be a site where several
subrecords start with the same four-byte sequence. Here's a good
example (it's long, so we've left out part):
158.43.192.14 (twister.dial.pipex.net):
[...]
000E 86AC9240
000E 4603712B
0002 D7E769CA
001E 0B01848F
000E 8A1266F1
000E 6DA218B8 957FF449 607AB5ED
000E 6DA218B8 957FF449 E90B0308
000E 6DA218B8 957FF449 D5D0798C
0002 6DA218B8 6A96D698 5F78E699
000E 6DA218B8 6A96D698 CCA4ED77
000E 6DA218B8 118AA2D3 5B69B41C
000E 6DA218B8 3CEC7FA9 48E41B10
000E 6DA218B8 3CEC7FA9 09ED716A
001E 6DA218B8 9B826D61 9BEC198D
000E 6DA218B8 9B826D61 8EF51A8C
000E 6DA218B8 1A7E65EE 8E16AE15
Notice how the four-byte values seem to be grouped together in an
hierarchical structure. Just like directories... It seemed a
reasonable guess that in fact, that's what they were. If they wanted
to block a URL like http://www.foo.com/bar/baz/, maybe they'd do it by
creating a record with the IP address of www.foo.com, and a subrecord
with some representation of the strings "bar" and "baz".
We said "some representation of the strings". What, exactly, does that
mean? Well, it would be quite reasonable to suppose that these
four-byte fields are hashes, similar in nature to the password hashes.
They could feed each URL component into a hash function, store only
the hashes, and then have enhanced security as well as various
efficiency advantages.
We figured out the exact nature of the hash function with the aid of
the bc1.com entry. As you can see above, every subrecord for that
server starts with the hash value 0xD2A152F4. If you look on the
corresponding Web site, you find that it's an ISP's server for user
home pages, all of which are stored in a "users" subdirectory. And it
just so happens that in the nonstandard CRC32 variant that was used as
half of the HQ password hash, the hash of the string "users" is
0xD2A152F4. Problem solved. We've designated this structure
TNotURLEntry.
Above we explain the cryptanalysis of CRC32 in considerable detail,
and we show how to construct, in negligible time, an input that will
generate any output of our choice. As with the passwords, Cyber Patrol
doesn't use any salt for its URL hashes, so we can recognize where
there are duplicate directory names even without reversing the hashes,
and get extra value for each hash we reverse because the same reversal
will be valid for all other occurrences of that hash.
Unfortunately, there is what might be called an "information
theoretic" problem with reversing these hashes. There are many
possible directory names that could generate the same CRC. We can
never be absolutely sure which of several equivalent (same CRC) URLs
was actually meant to be blocked. In the case of the HQ password, we
could use the other half of the hash output to recognize which one was
correct, but here, that doesn't work. In a perverse way, shortening
the hash has actually increased its security. But one good thing for
us as attackers is that of the many possible strings, only a few will
be meaningful. Given the choice between "sex" and "dkbgl~3.a7df", few
would argue with our choice of "sex". For the small number of hashes
which are hashes of very short strings, we can guess that the short
strings are really correct - there are so few possible strings of five
or fewer characters, that they're almost certainly right.
But for most hash values, the CRC32 reversal isn't really very
helpful. For any given hash it generates a long list of possibilities,
most of which are garbage. Instead of sorting through them, we fell
back on the old reliable dictionary attack. We took a list of words
and hashed them all, and then started modifying them by tacking tildes
onto the start (to make it look like user home directories), adding
letters to the start and end, adding ".htm" and ".html" to the end,
and so on.
The source file "cndecode.c" implements this attack on the cyber.not
file, as well as incorporating decryption code, some prettier output
formatting, and (for systems where this works) reverse DNS lookups. It
uses a hash table, and remembers the reversal of each hash for use on
future occurrences of that hash, in an effort to be as efficient as is
reasonable, although the prime emphasis was on expediency in
programming over squeezing out the last CPU cycles.
As a last resort, if it can't find a hash in the dictionary, the
cndecode program goes through all the possible reverse-CRC values up
to a configurable limit, assigning scores to them based on how
plausible they seem, and then chooses the best. That takes a
relatively long time (significant fraction of a second) per hash, and
it doesn't really work very well, but it does catch a few that aren't
caught by the dictionary attack. Here's a sample of the output:
************************************************************************
www1.iastate.edu
= 129.186.1.22
0006 http://129.186.1.21/.wmdnl/
000E http://129.186.1.21/~blak/
0008 http://129.186.1.21/~cwhipple/
0820 http://129.186.1.21/~ejackson/
0010 http://129.186.1.21/~ipdpfid/
0001 http://129.186.1.21/2kihan/
000E http://129.186.1.21/~omega/
0008 http://129.186.1.21/~roymeo/
0800 http://129.186.1.21/s(ettk/
0001 http://129.186.1.21/~thinker/
0001: Violence / Profanity
0006: Partial Nudity, Full Nudity
0008: Sexual Acts / Text
000E: Partial Nudity, Full Nudity, Sexual Acts / Text
0010: Gross Depictions / Text
0800: Alcohol & Tobacco
0820: Intolerance, Alcohol & Tobacco
************************************************************************
As this shows, URLs tend to be sorted within a given IP address. The
ones that aren't in sorted order are probably ones for which the
reverse-CRC didn't guess the right reversal. A more sophisticated
version might attempt to detect the sorted order, and force the
reverse-CRC to choose a reversal which would fit into the sorted
order, but the amount of work involved would probably be more than
it's worth.
This entry also shows something else we haven't talked about yet -
"alias" IP addresses, which are the apparent purpose of the one
remaining table in cyber.not. The structure can be seen in the
TNotIPEntry. These aliases are just that. Each entry consists of a
root IP and one or more aliases to that one. The root IP corresponds
to entries in the URL table, and any resource banned under the root IP
will also be banned under its aliases. These aliases may or may not
resolve to the same machine; the assumption here is that these IPs are
serving the same pages.
Let's talk briefly about hash collisions. The chance that any two
randomly chosen URL components will happen to have the same hash is
one in 2**32, which is not very likely. This is true even with the
uneven distribution of URLs, because CRC32 is a reasonably good hash
just as a hash, for all its cryptographic weakness. So at first
glance, it doesn't seem like there'll be a big problem of different
URLs having the same hash.
But the birthday paradox comes into play, too. With 2**32 possible
hash values, there starts to be a serious chance of collisions as soon
as the number of hashes gets past 2**16, which is 65536. It's
certainly easy to imagine that a large ISP could have more than that
many user home pages at the same location in their URL tree. Then two
or more different sites would have the same URL as far as Cyber Patrol
is concerned, and any block on one such page would hit the others.
Given the current size of the Net and the size of cyber.not, there
probably aren't any real examples of this kind of problem in the
cyber.not file. But there is very little safety margin. A 64-bit hash
would remove any suggestion of collision risks, at the cost of a
considerable increase in filesize.
Of course, using a 64-bit hash would improve our ability to attack the
cyber.not file too, by reducing the number of possible URLs for each
hash value. Remember how having the second half of the HQ password
hash made it so much easier to unambiguously reverse the hash?
Information theory makes this tradeoff unavoidable: the fewer possible
collisions, the easier and more unambiguous dictionary attacks will
necessarily become. Given that bytes in cyber.not are somewhat
expensive (because the file has to be transferred to all the users in
updates all the time), the choice of a 32-bit hash is probably
reasonable, even though it has some small risk of creating false
blocks.
A more practical security measure would be to salt the URL hash. In
the section on the HQ password we described how salting that hash
would make dictionary attacks on the password much harder. With the
URL hashes that becomes all the more significant, because with the URL
hashes we aren't attacking just one hash value. We're attacking a few
tens of thousands of hash values all at once. So anywhere we can
recognize that two hashes are the same, that's a win, and any time we
hash a dictionary word, we can easily check it against all the hash
values in cyber.not all at once.
If every URL in cyber.not had been hashed with a different salt value,
then we would have to hash an entire dictionary for every URL instead
of just hashing one dictionary for the entire file. That would raise
our time for a dictionary attack from a few CPU minutes to a few CPU
months - we could still do it, possibly by recruiting a network of
volunteers to compute cooperatively, but not as easily as the present
attacks.
They wouldn't even need to make cyber.not any bigger to get the
benefit of salted hashing - they could just use the offset of each URL
in the cyber.not file as its salt value. Salt doesn't have to be
random or secret, it just has to be different for each hash. They
would also have to upgrade the hash function to one that isn't linear
like CRC32; with CRC32, we could simply figure out the hash of the
salt, XOR it out, and then have an unsalted hash to attack normally. A
much more secure approach, which wouldn't make cyber.not any bigger,
would be to take the offset and the URL, hash them together with SHA1,
and then take the bottom 32 bits of the result.
But even that wouldn't raise the difficulty of attack above the level
of competent amateurs, and indeed, there is no way to make this kind
of hashing scheme any more secure. There just aren't enough possible
URLs on the Web; it's too easy for attackers to guess all possible
URLs and test them to see which ones would be blocked. Unix sysadmins
accept the fact that attackers can test passwords offline, and attempt
to educate their users to choose hard-to-guess passwords, but
censorware companies cannot ask all objectionable Web sites to choose
hard-to-guess URLs. So they ultimately cannot defend themselves
against this form of attack. With salt in the hashes, though, they
could make it a lot harder for us.
Next, the cyber.yes file contains "positive option" URLs; when the
software is configured to its strictest setting, only these URLs will
be permitted. There is also a list of newsgroups at the end that seems
to be in identical format to the one in cyber.not. A quick scan of the
decrypted file with a text lister showed that it's full of fragments
of ASCII text, like this (dump generated, amusingly enough, by Richard
E. Morris's good old DOS-based HEXEDIT program):
000880: 0B 01 00 7E 63 68 69 6E 6F 6F 6B 00 81 80 3D 11 |...~chinook...=.|
000890: 00 00 06 08 00 77 73 69 00 81 80 44 0A 00 00 15 |.....wsi...D....|
0008A0: 10 00 7E 77 61 6E 69 67 61 72 2F 73 70 61 63 65 |..~wanigar/space|
0008B0: 6C 69 6E 6B 00 81 0D 0A 64 00 00 10 09 00 7E 74 |link....d.....~t|
0008C0: 68 67 72 69 65 73 2F 64 69 73 63 00 81 89 C2 89 |hgries/disc.....|
0008D0: 00 02 81 89 21 25 02 40 81 0F 02 5A 00 00 07 40 |....!%.@...Z...@|
0008E0: 00 6F 75 70 64 19 48 00 7E 6E 77 73 2F 73 70 6F |.oupd.H.~nws/spo|
0008F0: 74 74 65 72 67 75 69 64 65 2E 68 74 6D 6C 00 81 |tterguide.html..|
000900: A4 28 6C 10 02 81 A4 28 DF 80 00 81 A4 28 E1 10 |.(l....(.....(..|
000910: 82 81 B1 0C 0C 00 00 0F 40 02 70 65 6F 70 6C 65 |........@.people|
These look like URL fragments, but they also look sort of haphazard.
In fact we theorized at one point that they might be stray garbage
from memory allocation calls. However, they do have a purpose, and
once we had the format of the cyber.not file, the cyber.yes file
became easy to figure out.
The same correlation-counting program that we ran on cyber.not showed
similar results on cyber.yes, with strong correlation at a distance of
six characters, but unlike cyber.not, no sharp peak at seven
characters. This suggested that the format for the main table in
cyber.yes would be very similar to that of cyber.not. Examination of
the hex dump showed similar stretches of six-byte repeats with a field
incrementing in big endian.
A little trial and error revealed that the format is essentially
identical: records with IP addresses and two-byte "mask-like" fields.
We say mask-like because it's not clear that they serve the same
function as the mask fields in cyber.not. When the mask-like field is
zero, there follows some number of variable-length URL records,
terminated by a zero byte. There are two significant differences in
the subrecord format. First, the URL is in plain text instead of being
hashed. As a result, the variable length can assume a less restricted
set of values. Second, the "mask" field appears to have a different
significance. Here is a sample record from cyber.yes:
202.231.128.32:
0802 "home/dbec1"
5A8A "home/kazoo"
5A8A "home/kiboc"
5A8A "home/kimin"
5A8A "home/sanyohs"
7ACA "home/terada"
7AEA "home/tomoy"
7AEA "home/tomoyuki"
7BFA "home/ueno"
7BFA "home/warp"
The hexadecimal column is the field that in cyber.not would be the
blocking mask. Here, it's not clear what it is. It could be some kind
of anti-blocking mask, of categories NOT to block, but then it's
surprising that it would be in sorted order (a pattern that persists
in other records too), especially when the URLs are also in
alphabetical sorted order. Other possibilities for this field include
some kind of time stamp, a serial number, an index pointer, an
authentication token or hash, or random memory garbage. The
"mask-like" fields on IP addresses similarly show little apparent
design, except that (just as in cyber.not) a zero value indicates the
presence of URL subrecords. The newsgroup list has mask-like fields
too, and there's no immediately obvious meaning to the data in them.
At this point we should note the overall file structure of cyber.yes.
Unlike cyber.not which had an elaborate header, the header on
cyber.yes consists of just three bytes: one version number (or
possibly encryption key fixup), and two bytes giving the length of the
URL table. We discovered this by working backwards from the URL table
until we found that all the bytes in the file except the first three
made sense as part of the URL table. The newsgroup list follows
immediately after the URL table and continues until the end of the
file, in the same format as the cyber.not newsgroup list except with
unknown data where the blocking mask would go. Unlike the tables in
cyber.not, both tables in cyber.yes are just bare data, with no "SD"
and "ED" delimiters.
This file structure is interesting because it seems stripped down or
simplified from the structure of cyber.not. It would be reasonable to
guess that the cyber.yes format was a quick hack retrofitted onto the
product subsequent to the more carefully-designed cyber.not table.
It's also possible that the cyber.not format proved too complicated
and cyber.yes is an example of a "leaner and meaner" file format,
still keeping to the same design principles as cyber.not and likely
re-using a lot of code originally written for cyber.not.
Following are the relevant structure tables. This concludes the
section on reversing the file formats.
6.1 Structure tables
TNotHeader
Offset Size Description
0x0000 2 Filetype? (0x00FC)
0x0002 2 Header size (0x002A)
0x0004 2 Header id ('CH' or 'HH')
0x0006 2 unknown ( 00 00 )
0x0008 2 unknown ( 00 00 )
0x000A 2 unknown ( 03 01 )
0x000C 2 Count of TNotHeaderEntries (0x0003)
Immediately followed by one or more of these:
TNotHeaderEntry
Offset Size Description
0x0000 2 Table type ( 4x 00)
0x0002 4 Absolute offset
0x0006 4 Size (in bytes)
The problem here is the Table Type field which we have too little data
to fill in with any certainty. We can build the following table from
the files we have analysed so far, built around the types that have
occurred and the type of data they pointed to.
TNotTableType
Value Binary Description
0x0041 0100 0001 Points to TNotIPEntries in cyber.not
0x0047 0100 0111 Points to TNotNewsEntries in hotlist.not
0x0049 0100 1001 Points to TNotURLEntries in cyber.not and hotlist.not
0x004E 0100 1110 Points to TNotNewsEntries in cyber.not and
hotlist.not
0x004F 0100 1111 Points to TNotURLEntries in hotlist.not
We can make no detailed conclusions from so little data.
TNotIPEntry
Offset Size Description
0x0000 4 IP
0x0004 1 Count of additional IP addresses (typically 1-23)
0x0005 * IP x count
TNotURLEntry
Offset Size Description
0x0000 4 IP Address
0x0004 2 Category blocking mask or 0x0000 to indicate a subrecord
follows
Subrecord
0x0000 1 Subrecord size
0x0001 2 Category blocking mask
0x0003 * URL hash
In the case where there are one or more subrecords, the list is
terminated by a zero byte.
TNotNewsEntry
Offset Size Description
0x0000 1 Record size
0x0001 2 Category blocking mask
0x0003 * Newsgroup string
Now, for the cyber.yes:
TYesHeader
Offset Size Description
0x0000 1 Filetype? (0xFB)
0x0001 2 Count of TYesURLEntries
This is the only record-type of the cyber.yes:
TYesURLEntry
Offset Size Description
0x0000 4 IP Address
0x0004 2 Unknown, or 0x0000 to indicate a subrecord follows
Subrecord
0x0000 1 Subrecord size
0x0001 2 Unknown
0x0003 * URL as plaintext
Same as for the TNotURL-entries, in the case where there are one or
more subrecords, the list is terminated by a zero byte.
7 Observations
With all these technical things resolved, let's look at the data
itself. First a table of statistics pulled from two different CyberNOT
files:
Cyber Patrol URL Database Statistics
Bit Category 1999-04-29 2000-02-20 Change
0 Violence / Profanity 1201 1407 +206 (17%)
1 Partial Nudity 46538 72236 +25698 (55%)
2 Full Nudity 45013 70248 +25235 (56%)
3 Sexual Acts / Text 47769 74009 +26240 (54%)
4 Gross Depictions / Text 1414 2273 +859 (61%)
5 Intolerance 259 337 +78 (30%)
6 Satanic or Cult 129 197 +68 (53%)
7 Drugs / Drug Culture 197 306 +109 (55%)
8 Militant / Extremist 187 204 +17 (9%)
9 Sex Education 201 270 +69 (34%)
A Questionable / Illegal & Gambling 1347 1928 +581 (43%)
B Alcohol & Tobacco 783 1155 +372 (48%)
C Reserved 4 48 3 -45 (1500%)
D Reserved 3 0 0 0 (0%)
E Reserved 2 0 0 0 (0%)
F Reserved 1 0 0 0 (0%)
Total URL masks 52315 79899 27584 (52%)
We can see that of the roughly 80000, entries about 90% fall into one
or more of the pornography categories. The Learning Company have a
page on their site describing their criteria for categorizing entries.
At the end it states: "Note: Web sites which post "Adult Only" warning
banners advising that minors are not allowed to access material on the
site are automatically added to the CyberNOT list in their appropriate
category.". This may give the impression that sites are automagically
added as soon as they appear on the web, which certainly isn't the
case. They are most probably using a web spider to pick these up.
These spidered sites probably make up the bulk of the URLs flagged in
all of categories 1, 2 and 3, which is the dominant set of flags by
far. By monitoring these statistics for a longer period of time one
could deduce how effective the spider is in finding new sites. The
oldest cyber.not we have available is dated 1999-04-29. By comparison
it contains only 52315 entries, but the ratio of "porn" rated sites is
the same, about 89%, with 46538, 45013 and 47769 entries flagged for
categories one, two and three respectively. Most of the other
categories are up by between a hundred and three hundred entries, but
the porn categories, suspected mostly to consist of spidered sites,
are up by about 25000 entries each for the period (about 38 weeks).
There is a function in CP where a user can use a form to report new
URLs for consideration of inclusion into the CyberNOT. It would be
interesting to know how many of the URLs added come in this way. It
would be possible for users to team up and exchange URLs on their own,
bypassing The Learning Company, which is charging for these CyberNOT
updates. By patching the CP executable it could be made so that this
report form is posted to another server, which could also host updated
CyberNOT lists. It would take a little work to set up, but not too
much. The most difficult aspect would probably be to reach out to
active Cyber Patrol users and convince them that this would be
worthwhile, especially since it would require a certain amount of
momentum to be worthwhile at all. With this threat, it's logical to
assume that The Learning Company and other censorware vendors will use
even more security-through-obscurity in future products, to deter the
threat of having one of their sources of income bypassed.
Near the start of this essay we mentioned the "reserved" blocking
categories. Cyber Patrol, in addition to the twelve documented
blocking categories, has an additional four (labelled "Reserved 1"
through "Reserved 4") which are greyed out. Reserved 3 and Reserved 4
are selected by default, and so cannot be disabled - even by the
administrator.
Any sites placed in one of those two categories will be blocked no
matter what. We found three examples on the now current CyberNOT list.
All three are in Japanese. They were each blocked in Reserved 4 and no
other categories; we could not find any examples of blocks on other
reserved categories.
* http://133.205.62.133/~coga/, which appears to say something like
"This domain has moved".
* http://202.26.1.170/~mcqueen/, which is mostly in Japanese but
includes the English text "The page you requested was not found".
* Tsutomu Notani's home page, which based on the pictures appears to
include some content about horse racing, and thus (presumably)
gambling. No other blockable content is immediately apparent.
There are a few entries in the CyberNOT list that are blocked under
all non-reserved categories. For instance, the anti-censorware site of
Peacefire is listed as containing "Violence / Profanity, Partial
Nudity, Full Nudity, Sexual Acts / Text, Gross Depictions / Text,
Intolerance, Satanic or Cult, Drugs / Drug Culture, Militant /
Extremist, Sex Education, Questionable / Illegal & Gambling, Alcohol &
Tobacco". That's not such a surprise; blocking Peacefire has become
traditional among censorware manufacturers.
The other sites blocked under all categories seem to be translation
and anonymizer services; any site where you can type in a URL and it
will present you a copy of that page. That's probably no big surprise
either, because such sites can be used to circumvent censorware. So it
may be reasonable that sites like anonymizer.com should be blocked
under all categories; potentially, they do make available the entire
range of human thought. Not all these blocks are carefully applied,
however; the "STOP KITTY PORN" page (which features a picture of a
very bored-looking house cat) is blocked under all categories
apparently just for containing a link to anonymizer.com. Here, as
elsewhere, the blocking list doesn't seem to be updated very
frequently. The server at 207.55.200.2 (whose reverse-DNS resolves to
"www.live4u.com", although that doesn't resolve in the forward
direction) seems to be an ordinary portal site, with no obvious
translation service, but it's blocked for everything except sex
education.
Of course, the most interesting things we could find on the blocking
list would be sites about political or social issues. Other censorware
packages have gotten in a lot of trouble, for instance, by blocking
sites like the National Organization of Women, and a great many gay
and lesbian sites. The CyberNOT list seems relatively free of that
kind of political agenda, which could be a good or a bad thing
depending on your point of view. If the software is to be installed in
public libraries, it's good that it won't block these
politically-important sites. Of course, it would be better if it
didn't block any sites at all. On the other hand, if you were a parent
who considered feminism or homosexuality to be unimaginably horrid
subjects, then you might feel ripped off by Cyber Patrol's not
blocking the high-profile sites.
Let's take a closer look at the category intolerance. While they do
block smaller sites, such as this one on atheism, which we feel is
relatively benign, they also block such high profile a site as
www.godhatesfags.com and part of American Family Organization, whose
views on homosexuality cannot be described as anything if not
intolerant. AFA is one organization pushing for the installation of
censorwares in US libraries. One can only assume they'd prefer one of
Cyber Patrol's competitors.
Some other sites in this category:
* Matthew R. Galloway's homepage. Contains the word "Voodoo" in a
reference to voodoo-cycles.com, and a pretty famous joke file
entitled Top 10 Reasons Why Beer Is Better Than Jesus. No #1 being
"If you've devoted your life to Beer, there are groups to help you
stop.", BTW.
* Misha Verbitsky's old homepage. Seems perfectly ordinary. Some
papers, a couple of usenet archives. Note that this page was
frozen several years back, so whatever it was censored for, is
still there.
* Church of the SubGenius. Banned in every category except sex-ed.
The Church is a spoof of fundamentalist Christianity, consumer
culture, and other things.
* joc.mit.edu/cornell/. This link is for the archive containing
files relevant to:
The Justice on Campus Project's mission is to preserve free
expression and due process rights at universities. Our online
archive includes reports on disciplinary charges, speech codes, and
censorship on college campuses around the country. The Project was
one of 20 plaintiffs in the ACLU's successful challenge of the
Communications Decency Act.
How very intolerant of them to be working for free speech, huh?
How about some examples from the category "Satanic / Cults"?
* Mega's Metal Asylum. Miika "Mega" Kuusinen's page of Metal music.
Articles, links. Perfectly ordinary. Tagged as militant, too.
Well, we all know how metal music is the devil's work.
* This site contains nothing but the text "Welcome!". If that's
enough to be branded a "Satanist", we can expect a rapid growth in
bans. If nothing else, this is another example of how the bans
grow outdated as time goes by, but The Learning Company doesn't
seem to care much.
* webdevils.com - "Experiments with sound", a site which has nothing
to do with religion, or lack of it. Guess the hostname was enough
in this case.
There is one political issue the CyberNOT list doesn't shy away from:
that of nuclear disarmament. All sites relating in any way to war,
bombs, explosives, or fireworks, both for and against, seem to be
eligible for blocking as "Militant / Extremist". Most are also classed
as "Violence / Profanity" and "Questionable / Illegal & Gambling",
whether those categories seem to apply or not. For instance:
* The Nuclear Control Institute. From the blocked page:
Founded in 1981, the Nuclear Control Institute (NCI) is an
independent research and advocacy center specializing in problems
of nuclear proliferation. Non-partisan and non-profit, we monitor
nuclear activities worldwide and pursue strategies to halt the
spread and reverse the growth of nuclear arms. No Bomb! In
particular, we focus on the urgency of eliminating atom-bomb
materials ---plutonium and highly enriched uranium---from civilian
nuclear power and research programs.
Is that an extremist position?
* A personal site including a lot of different material, apparently
blocked for something called "The Nazism Exposed Project". From
the blocked page:
Nazism, fascism and extreme nationalism are today at its highest
peak since the destruction of Hitler's dictatorship in 1945. Today,
all over the world, fascists and extreme nationalists win millions
of votes on their simple racist solutions to very complex problems
of the society. In the streets, Nazi boneheads are spreading fear
by using murderous violence and terror. These fascist groups blame
the cultural and ethnic minorities for the problems in our society.
These individuals, and their political leaders, are a threat to our
democracy, and to everything that is decent.
Blocked as "Violence / Profanity, Militant / Extremist,
Questionable / Illegal & Gambling".
* Anti-nuclear-bomb articles from the Tri-City Herald newspaper,
blocked as "Violence / Profanity, Militant / Extremist,
Questionable / Illegal & Gambling".
* One page in this directory (URL hash not fully reversed) on the
City of Hiroshima Web site, blocked as "Violence / Profanity,
Militant / Extremist, Questionable / Illegal & Gambling".
* Jim Lippard's home page, which contains some anti-Scientology
material and a link (not text) to this Salon article about the
Littleton shootings, which everone ought to read.
* Cheesehead Central, a personal home page, which contains a few
links relating to fireworks displays and therefore, apparently,
qualifies as "Violence / Profanity, Militant / Extremist,
Questionable / Illegal & Gambling".
* The former location of the American Airpower Heritage Museum - an
apparently-legitimate museum of US combat aircraft. Blocked as
"Violence / Profanity, Militant / Extremist, Questionable /
Illegal & Gambling".
Some sites that may be blockable under a few categories are also
blocked under a great many other categories. For instance:
* Teen Babe of the Month; it's a porn site, but it appears to be a
perfectly ordinary porn site. Blocked under all categories except
sex education.
* http://www.xs4all.net/~stones/, a link (not the actual site
itself) pointing at a warez search engine. That would presumably
qualify as "Questionable / Illegal", but it's flagged for
everything except sex education.
* http://www.danland.engelholm.se/, a personal home page. Some
content relating to warez, but nothing else blockworthy is
immediately apparent. Blocked for everything except sex education.
* The Marston Family Home Page, with the usual round of pictures of
Mom, Dad, the kids, the dog, etc. Entire directory blocked for
"Militant / Extremist, Questionable / Illegal & Gambling",
apparently just because of this paragraph in young Prescott's
section:
In school they teach me about this thing called the Constitution
but I guess the teachers must have been lying because this new law
the Communications Decency Act totally defys [sic] all that the
Constitution was. Fight the system, take the power back, WAKE
UP!!!!!
You go, boy.
It is obvious on examining the list that many entries haven't been
updated or checked in a long time. Many sites that are blocked now
give 404 not found errors, or redirects to new locations that are not
blocked. Changes to Web sites may also account for some of the
inappropriate category labelling. Here are some samples of sites that
seem inadequately reviewed:
* an empty page blocked in all categories except sex education, and
a 404 not found page blocked in all categories including sex
education. There are many others like these.
* A student home page at utexas.edu, blocked for "Violence /
Profanity, Partial Nudity, Full Nudity, Sexual Acts / Text,
Militant / Extremist, Questionable / Illegal & Gambling" content.
It consists mostly of (clothed) photos of the author's baby son,
with no blockable content immediately apparent.
* Another student home page at imsa.edu, blocked as "Violence /
Profanity, Militant / Extremist, Questionable / Illegal &
Gambling". Consists solely of a link to the author's resume, which
is perfectly ordinary.
* A personal home page at world.std.com. The part about his wife is
nauseatingly sweet, but doesn't really fit most people's
definitions of "Gross Depictions / Text, Militant / Extremist,
Questionable / Illegal & Gambling", which is what it's blocked
for.
* A sheet-music publisher, blocked as "Violence / Profanity,
Militant / Extremist, Questionable / Illegal & Gambling" for no
apparent reason.
These are just a few examples of sites that Cyber Patrol is banning,
or was. It is not unthinkable that they might lift a few after this is
published. We've only scratched the surface as far as checking on the
sites that are banned. Going through even a few hundred takes a lot of
time, and with almost 80,000 bans in effect, the work required to
check them all would be enormous. We don't have time to do it, but
since The Learning Company is making money from the supposed
correctness of the list, they ought to be able to find resources to
check the list from time to time.
We know they are banning 80,000 or so URLs, but most censorware
packages also have a database of words that are not allowed to exist
in incoming pages, because it's the only way to really approach being
effective in banning new pages on the ever evolving and growing
Internet. Cyber Patrol doesn't do that, and so its IP and URL bans are
its only real line of defence. If you can find a site that The
Learning Company have not, then there's very little stopping you from
browsing it. There is the function that can filter a site based on
substrings in the URL itself, but that is it.
Cyber Patrol is actually fairly efficient in blocking sites if you
don't know how to search effectively. If you simple search one of the
major search-engines then you will probably draw a blank, because it's
very likely that that is the exact kind of search used by The Learning
Company to bait their web-spiders. However, finding a few pages with
obscene banners and thumbnail pictures is no big problem. We could
locate this one and this one in short order. One somewhat effective
method is to search for non-English language pages. The spider might
not be effective in locating and parsing these for automatic inclusion
in the CyberNOT. You could for instance look for a Swedish site, and
locate www.smygis.com, which is not - as this is written - blocked in
any way. If you really want porn, Cyber Patrol might slow you down a
little, but it won't cut you off entirely.
7.1 Rogue deinstallation
Apart from checking for "unauthorized" modifications to cyberp.ini,
CP's "advanced anti-hacker security" consists of a new
%windir%\system\system.drv that checks for the existence of the
modules PROGIC, PROGICS and TS. These are represented by the files
IC.EXE, ICFIRE.EXE and TS.DLL, all in the %windir%. The original
system.drv is cleverly hidden away as %windir%\system.386.
The modules are loaded in two ways: first there is a load entry in the
win.ini file, and second, there's a entry in the registry at
HKCU\Software\Microsoft\Windows\CurrentVersion\Run called
"FltProcess", which will load %windir%\system\msinet.exe, which in
turn will load the Cyber Patrol modules. After replacing the
system.drv, which in the CP-version will halt loading of Windows if it
doesn't find it's modules, and ask you to call their support number,
you can safely do away with the registry entry, the load-key in the
win.ini and any of the numerous binaries. Because of the many files CP
installs to your system, we suggest you use the normal uninstaller
instead. Not that it does a very good job of removing its system
files, but there you go.
Optionally, if you come across an installation running unregistered,
you can use the backdoor password omed to uninstall, or simply to gain
administrator access.
8 Source and binaries
We have developed a set of software for getting around Cyber Patrol.
People oppressed by Cyber Patrol will want to take a look at CPHack, a
Win32 binary which will decode the userlist for you, and also let you
browse the different banlists.
Also available is C source for two command-line programs illustrating
the cryptographic attacks on cyber.not (cndecode.c) and the HQ
password hash (cph1_rev.c). These programs were written under Linux
and are not guaranteed to work anywhere else.
A complete package with this essay, the binaries, and various sources
and related files are available as cp4break.zip (~360Kb).
8.1 CPHack documentation
This tool is not particularly hard to use, but some comment on its use
could be in order. First of all the author would like to state that
this is a hack(1), which is reflected both in the state of the source
and the user interface. The basic functionality is to let you load and
browse the information of a cyber patrol .not file and/or the user
information contained in a cyberp.ini file. Simple select which you
want to load using the file menu. Also in the file menu are functions
for importing and exporting hosts. By importing hosts you are reading
a text file containing lines of IPs and their corresponding hostnames
into the treeviews. Export, of course, does the opposite.
Continuing we have the functions "Export dictionary" which will
traverse the treeviews and write out all words that have been assigned
to URL-hashes. "Export unresolved IPs" does just that; it could be
used to distribute the work of doing reverse-lookups. The final export
function is "Export URL hashes", which will export any hash that has
not been assigned a word, the logical inverse of the "Export
dictionary" function.
Maybe the most useful functions are the last ones, "Generate report",
which will output a HTML document reflecting the data you have loaded.
Be sure to check out the "Configuration" tab before doing that though,
and the somewhat mysterious "Cull dictionary by hash". The last
function will take the main dictionary (as defined in the
configuration tab), and create a new dictionary containing only the
words with hashes contained in a .not file you have loaded. A bit of
explanation on this: It was thought by the author that a lazy
dictionary attack would be enough. This lazy approach is what you get
if you select one of the attacks available by right-clicking a node.
However, this proved quite slow when used with large dictionaries
(15Mb or so), as it only looks at one URL at a time.
The problem here is that CPHack will try - for each node - lots of
words from the dictionary with hashes that doesn't exist in the
database at all. As a quick hack on the hack, this function was
implemented, which will take all the hashes in the database and attack
them all at once. The downside is that no references are kept as to
which exact nodes the found hashes belong to, so you will only get a
new optimized dictionary to use in the lazy attack, you won't get a
instant update to the treeview. While desirable, it would take too
much time and effort - at this point - to implement correctly. A good
implementation would traverse the nodes you have selected, creating a
ordered list of unique hashes, attached to which would be lists of all
associated nodes. When the hash of a word is found in this ordered
list of hashes, the correct chain of tree nodes could be quickly
traversed and nodes updated to reflect the hit. Until this is fixed,
you should cull the dictionary first, and use the output with the lazy
attack, to "assign" all words into the database.
The main interface contains the five sections "Users", "Newsgroups",
"URL database", "IP Aliasing" and "Configuration". A quick rundown
follows.
If you load a cyberp.ini the "Users" tab will display the names and
passwords of the users therein, including the passwords of the innate
administrator and deputy accounts.
After loading a CyberNOT file, the "Newsgroups" tab will display all
filters defined therein. To the rights is a panel of checkboxes which
you cannot operate, but will reflect the masks applied to the
newsgroup entry you select in the listview.
Next we have the "URL database" tab, which contains a treeview where
you can browse the database. It should be noted that the relative long
loading time of a CyberNOT file is due to the way the treeview works,
with insertion into a branch - apparently - being O(n) and not about
O(1) in regard to the number of siblings of a new node. Anyway, you
can browse the view in the normal manner of things. There are three
different types of nodes, the first being called internally a "net
node". This is simply a root node containing all entries for IPs of a
"A net". Below these are "IP nodes" which are the IPs that are banned
by the database. Some of these have children of their own, being "URL
nodes" which contains the hashes of specific paths and resources being
banned. You can right-click on any one of these three types of nodes
for additional context sensitive functionality, such as "Open",
"Lookup" and "Dictionary attack". As with the newsgroups tab, there is
a panel of checkboxes which will reflect the masking status of the IP
or URL you select. At the bottom is a quick search bar where you can
do case sensitive string searches.
There's not much to say about the "IP Aliasing" tab, but here too you
can right-click for additional functionality.
Finally we have the configuration tab where you define the different
dictionaries you want to use, and a number of other things which are
self-explanatory, except maybe for the "Lock found URLs". This
function, if enabled, makes sure that once a word has been found to
match a hash and been attached to it in the treeview, then it will
never get replaced even if another possible candidate is found.
This program is entirely self contained. It will not write to the
registry, and it will not create files anywhere but in the its own
path, unless you say it can.
The source is included, and you can do whatever you want with it.
9 Conclusion
On the good side, we note that Cyber Patrol is - technically -
somewhat better than NetNanny and CyberSitter, the two other
censorware packages we have intimate knowledge of, but there is still
far too much 16-bit code for it to be really stable and earn a good
grade.
We see no evidence of a clear political or religious agenda behind
Cyber Patrol, though as citizens of highly secularized countries we
might feel that many of the bans in the "Satanist / Cult" category are
unreasonable. Their criteria document says "Satanic material is
defined as: Pictures or text advocating devil worship, an affinity for
evil, or wickedness." and "A cult is defined as: A closed society
[...] Common elements may include: [...] influences that tend to
compromise the personal exercise of free will and critical thinking."
LaVey Satanism - for instance - isn't about any of the things in the
full definition, and atheism certainly isn't, but such sites are
included in the CyberNOT.
The evidence points to the CyberNOT list not being properly updated to
remove old and outdated entries. As many as 50% of the IPs in the list
doesn't even resolve! When evaluating a product with a ban list, you
should not look at the number of entries, but the number of current
entries. Simply collecting new entries, and using the ever growing
(but outdated) list of bans as an argument in the sales game, is much
easier than actually putting in work to ensure the list is up to date
and accurate.
The old classic tactic of entering critics into the banlist continues,
with the banning of Peacefire in almost every category available. When
the producers are knowingly banning a site in clearly the wrong
categories, then what kind of trust can you put in them and their
products? None. We must continue to reverse-engineer these products so
that consumer rights can be protected. Will we ever find a censorware
company who are not lying to us with these false bans?
The absence of filtering based on content keywords is surprising, but
welcome. The technology does not exist to make content-based filtering
really functional. The problem of recognizing content and making
choices based on context is a hard one, suitable for research by the
AI-labs. But it is a two-edged sword. The price of leaving this
error-prone functionality out is that it makes Cyber Patrol less
effective in blocking pages not previously processed by The Learning
Company.
After all this, the feeling is that CP is just another censorware
package. It tries hard to come across as effective - the magical
technical solution to a non-technical problem - but when push comes to
shove, it yields to the power of the human mind. If you thought
putting this between your children and the Internet would protect them
from "dangerous" ideas, then you'd better think again.
9.1 Thanks
We would like to thank all the fine men and women working for civil
liberties all over the world.
Matthew would like to thank: the goddess Pele for favours received,
and the Canadian government for supporting my cryptographic interests
in several ways. Greetings to all the people I hang out with in
sci.crypt, alt.kids-talk, talk.bizarre, and the VLUG and Voynich
mailing lists.
Eddy would like to thank: Robert Risberg, Kristoffer Andergrim,
Mattias Aspman, Gunnar Rettne, and all of my friends around the world.
Special regards to all the intelligent, knowledgeable and humorous
folks of R20 of the Fidonet - you know who you are.
All cryptanalysis done by Matthew Skala. Reverse Engineering done by
Eddy L O Jansson and Matthew Skala. Feel free to contact the authors
with your comments and/or questions.
This essay first published at Eddy's homepage in 2000-03-11. You'll
find Matthew's homepage here.
You are allowed to mirror this document and the related files anywhere
you see fit.
10 References
[DFR98] Saruman and Bobban, "The Penetration of CyberSitter'97", Apr
1998.
[DFR99] Saruman, "The Reversal of NetNanny", Aug 1999.
[ACLU96] American Civil Liberties Union "FCC V. Pacifica Foundation",
1996.
[RNW93] Ross N. Williams "A painless guide to CRC error detection
algorithms", Aug 1993.
[JRG00] Raphael Finkel, Eric S. Raymond, et al. "The on-line hacker
Jargon File, version 4.2.0", Jan 2000.
(c)2000 Eddy L O Jansson and Matthew Skala. All rights reserved. All
trademarks acknowledged.
[END]
# distributed via <nettime>: no commercial use without permission
# <nettime> is a moderated mailing list for net criticism,
# collaborative text filtering and cultural politics of the nets
# more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body
# archive: http://www.nettime.org contact: nettime@bbs.thing.net